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SUMMARY 


Let  X^,  i-l,...,k  be  independent  Bernoulli  variables,  with 
~ BCl.p^).  Let  Y =*  EX^,  and  consider  a multinomial  experiment  based 
on  n independent,  identically  distributed  observations  Y^,...,Yn. 

This  model  is  identifiable  in  the  ordered  parameter  vector  (p^^, . . . ,p^ ) * 
where  an<*  arises  in  reliability  experiments  in  which  k 

components  in  parallel  have  potentially  different  probabilities  of  failure. 
The  family  of  multinomial  distributions  with  the  structure  described  above 
is  properly  contained  in  the  family  of  general  multinomial  distributions 
with  (k+1)  classes.  Maximum  likelihood  estimation  for  this  family  is 
considered,  and  it  is  shown  that  for  sufficiently  large  n,  the  maximum 
likelihood  estimate  of  (p^ ^ , . . . ,P^V^)  may  be  identified  with  high  proba- 
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bility  from  the  roots  of  a k degree  polynomial  whose  coefficients  are 
consistent  estimates  of  the  elementary  symmetric  functions  of  the  ratios 
®(i)  “ P(i)^^-p(i)^*  A simulation  study  for  the  case  k = 2 sheds  light 
on  the  sample  size  required. 


INTRODUCTION 


Let  X^,  i=l k be  independent  Bernoulli  random  variables  with 

trot  It  I 

potentially  different  probabilities  of  success  p . , i-l,...,k.  We  < 

.1  ‘ ref  ^ X 

denote  this  situation  by  ~ B(l,Pj),  i*l k.  Let  Y = X^,  and 

assume  that  a random  sample  Y?,Yr,  ...,Y  is  available.  The  common  dis- 

2 n /F\ 

tribution  of  these  Y's  is  the  k-fold  convolution  to  be  denoted  * B(l,p4). 

„ L 'i-1 


* 4 

Y i-  * 
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This  note  concerns  the  estimation  of  the  parameters  of  this  convolution 
based  on  the  Y sample  via  the  method  of  maximum  likelihood. 

\ 
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I 

Estimation  problems  for  convolution  models  have  been  considered 
by  several  authors.  For  example,  Gaffey  (1959)  constructed  a consistent 
estimator  for  the  distribution  of  one  component  of  a continuous  convolu- 
tion model  under  the  assumption  that  the  distribution  of  the  second 
component  was  known.  Sclove  and  Van  Ryzin  (1969)  derived  method  of 
moments  estimators  for  a variety  of  convolution  models.  Maximum  likeli- 
hood estimation  has  met  substantial  resistance  for  convolution  models  due 
to  the  cumbersome  nature  of  the  likelihood  function,  which,  for  discrete 
components,  consists  of  a product  of  sums  of  products  of  component 
probabilities.  Samaniego  (1976)  used  a characterization  of  convoluted 
Poisson  distributions  to  facilitate  maximum  likelihood  estimation  of  the 
Poisson  parameter.  A similar  approach  was  taken  by  Samaniego  (1977)  for 
maximum  likelihood  estimation  in  convoluted  binomial  distributions. 

Both  of  these  studies  have  dealt  with  one-parameter  models  in  which  one 
component  of  the  convolution  has  a known  distribution.  In  general, 
maximum  likelihood  estimation  for  nultiparameter  problems  (with  the 

exception  of  the  problem  treated  here)  has  as  yet  proven  untractable. 

k 

The  convolution  * B(l,p.)  is  not  well  defined  for  estimation  purposes, 

1 1 

since  the  model  is  not  identifiable  in  the  parameter  vector  £ « (p^,...,p^). 

It  is  clear  that  any  permutation  of  the  components  of  £ gives  rise  to  the 

same  distribution.  It  is  easy  to  verify  that  this  is  precisely  the  extent 

of  multiplicity  in  the  model,  and  that  the  model  is  therefore  Identifiable 

in  the  ordered  parameter  vector  (p^.p^j P(k)^’  where  p(i-l)  - p(i) 

We  will  consider  estimation  of  the  ordered  parameter  vector, 
k 

The  model  * B(l,p.)  arises  naturally  in  reliability  experiments. 

1 

Suppose  k components  are  operating  independently  in  an  r out  of  k 


system,  and  their  probabilities  of  operating  successfully  over  a specified 
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time  period  are  p^,  1*1,..., k.  The  four  tires  of  an  automobile  yield 
an  example  of  components  whose  life  lengths  are  independent,  but  are 
not  identically  distributed  because  of  the  different  stresses  experienced 
due  to  tire  location.  An  r out  of  k system  is  only  as  reliable  as 
its  best  component,  and  thus  it  may  be  of  Interest  to  estimate  the 
ordered  parameter  vector.  The  model  we  deal  with  assumes  that  no  sub- 
system information  is  available,  that  is,  the  X variables  from  which 
the  observable  Y is  constructed  are  themselves  unobservable.  Such 
an  assumption  is  realized  in  many  biological  or  engineering  systems. 

A number  of  authors  of  reliability  texts  have  discussed  the  model 
k 

* B(l,p, ) as  an  introductory  example  (see,  for  instance,  Barlow  and 
1 

Proschan  (1975)  p.  20  ff.).  While  many  properties  of  the  model  are 

quite  well  known,  inference  questions  remain  largely  uninvestigated. 

The  estimation  problem  at  hand  is  tangentially  related  to  the  estimation 

of  parameters  of  mixtures  of  binomial  distributions  studied  by  Blischke 

(1964),  and  to  estimation  under  order  restrictions  studied  extensively 

in  Barlow  et  al.  (1972).  The  problem  does  not  seem  to  benefit,  however, 

from  either  of  the  approaches  used  in  these  studies.  An  estimation 

problem  for  this  model  with  k * 2 was  considered  by  Buehler  (1957). 

In  that  paper,  subsystem  Information  was  assumed  available  and  a 

minimum  width  confidence  interval  for  the  reliability  a series 

system  was  obtained.  Our  own  Interest  in  the  problem  considered  here 

originated  from  an  attempt  to  derive  the  maximum  likelihood  estimate  of 

the  parameter  vector  (pltP2)  in  the  convolution  B(N,p^)  * B^.p^)  of 

two  binomial  distributions.  It  is  interesting  that  this  model  is 

k 

subsumed  by  the  model  * B(l,p.)  for  k * N + H,  so  that  the  comments 

1 1 

developed  here  in  fact  apply  to  the  binomial  convolution.  The  approach 
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taken  here  is  inefficient  for  the  latter  model,  however,  because  the 
special  structure  of  the  binomial  convolution  is  ignored  by  this 
approach. 


II.  THE  CASE  k - 2 

We  summarize  in  this  section  the  derivation  of  the  maximum  like- 
lihood estimates  of  the  ordered  parameters  p^  and  P(2)  a two“ 
component  system.  The  character  of  the  general  problem  can  be  gleaned 
from  this  case.  Moreover,  the  solution  for  k * 2 is  complete,  whereas 
in  the  general  problem,  we  consider  only  certain  important  special 
cases.  For  the  case  with  k - 2,  let  n^  denote  the  observed  frequency 
of  the  event  Y » i,  for  1-0, 1,2,  in  the  sample  Yj,...,Yn«  The  likeli- 
hood function  is  given  by 


L(ng,n^,n2»P^ , P2) 


_ 1 _ 1 _ 1 

n0  1 2* 


The  maximum  likelihood  estimate  of  (P(i)»P(2))  obtained  from  separate 
maximization  problems  for  various  possible  data  configurations,  and  is 
displayed  in  the  table  below: 
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In  this  estimation  problem,  the  HLE  Is  restricted  to  the  simplex  in 
the  plane  bounded  by  the  lines  P^)  “ ^ an<*  P(l)  “ ^(2)' 

first  seven  cases  in  Table  I are  boundary  solutions.  Even  in  the  straight- 
forward problem  examined  in  this  section,  however,  boundary  solutions  re- 
quire some  work.  For  example,  in  Case  4 above,  the  likelihood  maximized 


on  the  boundary  p 


(2) 


1 takes  on  the  value 
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The  fact  that  L^  > may  be  inferred  from  the  fact  that  the  function 
x y 

(1  + y/x)  , for  fixed  y,  increased  to  e as  x increases  from  zero 
to  infinity.  In  the  general  problem,  the  simplex  over  which  the  likeli- 
hood is  maximized  is  bounded  by  a multitude  of  hyperplanes,  and  boundary 
searches  for  the  MLE  are  at  the  very  least  quite  tedious. 

Let  us  view  the  estimation  problem  from  another  perspective. 

Suppose  we  look  for  the  MLE  in  terms  of  the  basic  multinomial  proba- 
bilities (P(Y=0),  P(Y»1))  subject  to  the  constraints  imposed  by  the 
model.  We  illustrate  the  outcome  in  the  figures  below,  in  which  the  dot 
represents  the  location  of  the  unconstrained  MLE  (n^/n,  n^/n)  and  the 
box  represents  (albeit  oversimplified)  the  constrained  parameter  space. 

P(Y«1)  P(Y=1) 


FIGURE  I 


FIGURE  II 
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P(Y-l) 


P(Y=0) 


FIGURE  IV 


We  make  several  observations.  First,  we  notice  that  the  uncon- 
strained MLE  may  lie  on  the  boundary  or  in  the  interior  of  the 
parameter  space,  and  may  be  inside  or  outside  of  the  constrained 
space.  When  it  lies  outside  the  constrained  space,  it  is  easy  to 
argue  that  the  constrained  MIE  will  lie  on  the  boundary  of  the  con- 
strained space.  When  the  unconstrained  MLE  lies  within  the  constrained 
space,  the  MIE  for  (p^.p^))  “ay  be  °btained,  at  least  in  theory,  by 
the  invariance  property  of  MLE's.  Our  final  observation  is  one  which 
we  will  make  more  precise  in  our  treatment  of  the  general  problem. 

We  note  that  among  the  seven  cases  in  which  the  nonzero  n^  are  consecutive 
integers,  only  in  Case  7 does  the  unconstrained  MIE  lie  outside  of  the 
constrained  space.  We  thus  conclude  that  the  MLE  for  (P(i)>p(2)^  “ay 
be  found  by  the  invariance  property  of  MLE's  in  almost  all  cases  in 
which  the  integers  i for  which  n^  is  nonzero  are  consecutive. 


III.  THE  GENERAL  CASE 


k 

Let  Y, Y be  i.i.d.  according  to  the  distribution  * B(l,p. ), 

1 n i-1  1 

and  let  n^  be  the  observed  frequency  of  the  event  Y - i for  i-0,l,...,k. 


P(Y*0) 
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The  likelihood  function  may  be  written  as 


k n0  k nl  k n 

n n,  i 

i=0 


(3.1) 


Maximizing  L with  respect  to  £ is  a difficult  problem  for  several 


reasons.  First,  there  are  2 


k+1 


1 different  data  configurations  (where 


certain  n^  are  zero  and  the  rest  are  positive),  and  the  function  L to 

be  maximized  is  different  for  each.  Secondly,  the  likelihood  equations 

L = 0,  i»l,...,k,  form  a system  of  k equations  each  of  which 

involves  all  k parameters  in  a nontrivial  way.  Another  approach  to 

maximum  likelihood  estimation  is  maximization  of  L with  respect  to  the 

multinomial  probabilities  {p(Y-i)j,  subject  to  the  constraints  on  these 

probabilities  imposed  by  the  model.  The  difficulty  with  Lagrangian 

maximization  in  this  problem  is  that  the  constraints  are  extremely  complex 

and,  for  practical  purposes,  defy  description.  We  will  pursue  a third 

approach,  one  that  cannot  be  guaranteed  to  produce  the  MLE  for  any  fixed 

sample  size  n,  but  which  produces  the  ML£  with  limiting  probability  one 

when  the  parameters  p,..,  i-1, ...,k  are  distinct. 

' k 

We  first  consider  data  configurations  for  which  II  n,  > 0.  We 

i-0  1 

attempt  to  find  the  MIX  for  the  ordered  parameter  vector  (p^ P(k)^ 

by  the  invariance  property  of  Mix's,  that  is,  by  solving  the  system  of 
equations 

^-pcy-o).  n u-p,.x) 


l-i 


ar 


« ■ p(y-d  - z n <i-p,0) 


i-l  <*>*! 


(J)‘ 


(3.2) 


“T  - P0f-k)  - Dp 


iZl 
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The  system  (3.2)  consists  of  k+1  equations,  but  Is  determined  by  any  k 
of  them.  We  divide  the  last  k equations  by  the  first  to  obtain  an 
equivalent  system 


i«l  ^-P(i/  no 


<&!!->  5 


Kj  K(i)' 


(j) 


(3.3) 


\ (n  ^ 

i-l  P(j)" 

k , P, 


Vl 


n„ 


i-l  U-P(i/  no 


We  recognize  the  left  hand  side  of  (3.3)  as  the  elementary  symmetric 

a 

functions  in  which  implies  that  if  the  system  (3.3)  has  a 

u-P(1)j 

solution  is  unique  and  may  be  obtained  as 


ill 


(t)  i+e 


i— 1 , . . . , k 


(3.4) 


(i) 


where  0^,  i*l,...,k  are  the  ordered  roots  of  the  polynomial 


i k-i 

p(x)  - I (-D  n xK 
i-0  1 


(3.5) 


The  maximum  likelihood  estimate  of  (p^j,  ****P(k)^  ma^  ^e  from 

(3.4)  only  when  the  polynomial  p(x)  has  k nonnegative  roots.  It  is  of 
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course  possible  for  p(x)  to  have  some  complex  roots,  or  for  some  of  the 
roots  of  p(x)  to  be  negative.  However,  ow-.  can  show  that  when 

0 < p(l)  < P(2)  < *“  < p(k)  < 1 

k 

lim  P({n  n,  > 0}  H {p(x)  has  k roots  > 0})  * 1.  (3.6) 

n -»  oo  0 


Thus,  for  large  samples,  we  expect  to  be  in  the  case  considered  above, 
and  we  expect  to  be  able  to  identify  the  MLE  by  equations  (3.4).  We 
briefly  sketch  a proof  of  (3.6).  The  fact  that 

k 

P(n  n.  > 0)  - 1 
0 

is  clear  from  the  Bonferroni  inequality,  since 

k k 

P(  0 {n  + 0})  > 1 - Z P(n  = 0) 
i-0  i=*0 


k 

=■1-1  (l-P(Y=i))n  . 
i=0 


This  latter  expression  clearly  tends  to  one  as  n tends  to  infinity.  Since 

k 

P(p(x)  has  k roots  > 0 1 II  n,  > 0)  - P(p(x)  has  k roots  > 0) 

0 

tends  to  zero  as  n tends  to  infinity,  it  suffices  to  show  that 


P(p(x)  has  k roots  > 0)  — 1. 


(3.7) 


To  see  this,  we  focus  on  the  polynomial 


— p(x)  if  n > 0 

f(x)  - ^ n0  0 

1 if  Hq  ■ 0. 
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The  coefficients  of  f(x)  are  consistent  estimates  of  the  elementary 

P(i> 

symmetric  functions  of  the  ratios  (0^  ■ , i**l,...,k}.  Since 

P(Hq  > 0)  -•  1,  we  have  that  for  each  fixed  x,  f(x)  will  converge  in  probability 

to  the  polynomial  with  roots  0^ 9(k)‘  If  p(l)  < p(2)  < ***  < p(k)’ 

these  k roots  are  distinct.  It  is  thus  possible  to  choose  points  x^, 
i*0,...,k  such  that  x^_^  < 0^^  < x^,  and  the  limiting  polynomial  takes 
alternating  signs  at  successive  x's.  We  may  choose  N sufficiently  large 
so  that  f(x)  has  alternating  signs  at  successive  x's  with  arbitrarily  high 
probability,  establishing  (3.7)  which  implies  (3.6). 

It  is  not  possible  to  obtain  the  same  result  if  the  parameter  vector 

is  on  the  boundary  of  the  parameter  space.  In  that  case,  the  limiting 

polynomial  mentioned  above  has  some  roots  of  multiplicity  greater  than 

one.  It  is  possible  that  a sequence  of  polynomials  converges  to  such  a 

polynomial,  and  yet  no  polynomial  in  the  sequence  has  any  real  roots.  For 

example,  the  polynomials  [gQ(x)  - x^  - (6  - -^)x^  + “ 12x  + 4}  have 

2 2 

no  real  roots,  yet  converge  to  the  polynomial  g(x)  » (x-1)  (x-2)  . It  is 
a fortunate  fact,  however,  that  a sequence  of  random  polynomials  does  not 
behave  like  a sequence  of  deterministic  polynomials.  Thus  for  large  n, 
we  find  that  the  ML£  may  be  identified  from  the  roots  of  the  polynomial 
(3.5)  with  reasonable  frequency  even  in  the  case  of  a boundary  parameter 
vector.  The  simulation  results  summarized  in  the  next  section  will  make 
this  remark  clearer.  Thus,  although  the  method  proposed  here  does  not 
succeed  in  identifying  the  MLE  with  limiting  probability  one  for  boundary 
parameter  vectors  as  it  does  for  vectors  in  the  interior  of  the  parameter 
space,  the  method  may  still  be  attempted  and  will  produce  the  MI£  with 
some  positive  probability  --  the  exact  value  of  which  depends  on  the 
exact  form  of  the  limiting  polynomial. 
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k 

While  it  Is  true  that  P (13  n,  > 0)  tends  to  1 is  n - provided 

0 

0 < p^  < P^)  < c^e  8Pee<*  this  convergence  will  depend  on  the 

exact  size  of  the  parameters.  There  are  cases  of  practical  Importance 

In  which  maximum  likelihood  estimation  is  of  Interest  for  sample  sizes 
k 

for  which  P (II  n « 0)  is  quite  high.  Of  particular  importance  are 
0 1 

problems  in  which  several  p^'s  are  very  large  and/or  several  p^'s  are 

very  small.  As  Buehler  (1957)  has  noted,  such  problems  occur  with 

considerable  frequency  in  reliability  experiments.  It  is  interesting 

k 

to  note  that  the  method  proposed  here  for  the  case  D n > 0 tends  to 

0 

work  nicely  for  problems  in  which  the  Integers  with  nonzero  observed 

frequencies  are  consecutive  --  precisely  the  expected  data  configuration 

for  the  problem  of  Interest.  We  summarize  below  the  details  of  the 

extension  of  the  method  to  this  problem. 

Let  us  suppose  that  the  observed  frequencies  from  a sample 

Y ,,..., Y are  as  follows: 

1 n 


where  0 < r < s < k,  and  n ^ = 0 s n^+^.  Then,  an  attempt  to  use  the 
invariance  property  of  Mix's  to  identify  the  MIX  for  (p^^ , . . . ,p^) » 
that  is,  an  attempt  to  solve  the  system  (3.2),  yields 


fo) ' 0 

and 

P(k-rfl)  " ***  " P(k)  " l» 


with  the  remaining  estimates  being  Identified  as 
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<J)  1+0 


ill 


(J) 


J-k-s+1, . . . ,k-r 


where  0^  s+i)*  ***,0(k-r)  are  t*le  or<^ere<*  roots  of  the  polynomial 


p(x)  - E n.(-l)i-rx8_i, 
i-r 


(3.9) 


provided  this  polynomial  has  (s-r)  nonnegative  roots.  For  the  case  k » 2 
considered  in  the  last  section,  the  polynomial  (3.9)  is  linear  in  the  two 
cases  with  a zero  observed  frequency  and  consecutive  integers  with  non- 
zero frequencies  (labeled  Cases  4 and  6 there). 


IV.  DISCUSSION 

Maximum  likelihood  estimation  for  the  parameters  of  the  model 
k 

* B(l,p.)  is  a complex  problem  in  which  many  different  likelihood  surfaces 
1 

must  be  examined,  and  for  which  no  closed  form  solution  is  possible  in 
general.  While  numerical  methods  are  always  available  for  searching  for 
MIE's,  they  tend  to  be  quite  unwieldy  in  multiparameter  problems.  In 
Section  III,  we  have  demonstrated  that  the  MLE  may  be  found  with  high 
probability  from  the  roots  of  a kth  degree  polynomial  when  n is  large 
and  the  k parameters  are  distinct.  This  leaves,  of  course,  the  numerical 
problem  of  obtaining  roots  of  this  polynomial,  it  this  problem  is  easily 
accomplished  using  standard  techniques.  It  is  a substantially  simpler 
numerical  problem  than  the  problem  of  "hill-climbing"  with  a k-variate 
criterion  function. 

The  results  discussed  in  Section  III  are  asymptotic  in  character,  and 
it  is  of  interest  to  examine  the  question  of  sample  size  requirements.  We 

present  below  the  result  of  a very  modest  simulation  study  --  we  hope  Co 
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report  on  the  results  of  a more  ambitious  simulation  in  a future  note. 

For  the  present,  we  examine  only  the  case  k = 2 and  the  sample  size  n » 50. 
We  conclude  from  our  simulation  that  n * 50  is  a "large  sample"  in  terms  of 
the  level  of  probability  experienced  in  identifying  the  MLE  by  the  method 
of  Section  III. 

In  Table  II  below,  we  give,  for  different  parameter  values,  the 
frequency  of  occurrence  of  Cases  1-8  (see  Table  I)  in  100  samples  of 
size  50  drawn  from  the  convolution  B(l,p^j)  * B(l,p^)*  The  last  column 
tabulates  the  frequency  of  occurrence  of  samples  for  which  integers  i 
with  nonzero  n^  are  consecutive  and  the  MI£  could  be  identified  by  the 
invariance  principle. 


(continued  on  page  16) 


16 


TABLE  II  (Continued) 


With  a sample  of  size  50  in  a two-component  system,  the  likelihood 
of  obtaining  the  MLE  by  the  invariance  principle  ranges  (in  our  simulation) 
from  447.  to  100%,  the  higher  likelihoods  being  associated  with  parameter 
values  that  are  fairly  well  separated.  We  see  that  the  invariance  principle 
is  not  highly  reliable  when  • P(2)'  as  be  anticipated  by  our 

remarks  in  the  previous  section.  However,  there  is  a reasonable  chance  of 
obtaining  the  MIE  by  Invariance  even  in  this  case.  For  fixed  p^  » P^)» 
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this  likelihood  should  be  about  the  same  regardless  of  the  sample  size. 

One  further  observation  --  while  our  simulation  did  not  involve  < *1 

or  > .9,  it  is  clear  that  the  relative  frequency  with  which  the  MLE  may 
be  obtained  by  invariance  tends  to  one  as  either  p^^  - 0 or  p(2) 
since  in  either  of  these  circumstances,  the  probability  of  the  set  of  cases 
(1,  2,  3,  4,  6,  8)  in  which  the  invariance  principle  works  tends  to  one. 
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